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Abstract 

We study data processing inequalities that are derived from a certain class of generalized 
information measures, where a series of convex functions and multiplicative likelihood ratios are 
nested alternately. While these information measures can be viewed as a special case of the 
most general Zakai-Ziv generalized information measure, this special nested structure calls for 
attention and motivates our study. Specifically, a certain choice of the convex functions leads to 
an information measure that extends the notion of the Bhattacharyya distance (or the Chernoff 
divergence): While the ordinary Bhattacharyya distance is based on the (weighted) geometric 
mean of two replicas of the channel's conditional distribution, the more general information 
measure allows an arbitrary number of such replicas. We apply the data processing inequality 
induced by this information measure to a detailed study of lower bounds of parameter estimation 
under additive white Gaussian noise (AWGN) and show that in certain cases, tighter bounds can 
be obtained by using more than two replicas. While the resulting lower bound may not compete 
favorably with the best bounds available for the ordinary AWGN channel, the advantage of the 
new lower bound, relative to the other bounds, becomes significant in the presence of channel 
uncertainty, like unknown fading. This different behavior in the presence of channel uncertainty 
is explained by the convexity property of the information measure. 



Index Terms: Data processing inequality, Chernoff divergence, Bhattacharyya distance. Gal- 
lager function, parameter estimation, fading. 
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1 Introduction 



In classical Shannon theory, data processing inequalities (in various forms) are frequently used 
to prove converses to coding theorems and to establish fundamental properties of information 
measures, like the entropy, the mutual information, and the Kullback-Leibler divergence [5]. A 
very well-known example is the converse to the joint source-channel coding theorem, which sets 
the stage for the separation theorem of Information Theory: When a source with rate-distortion 
function R{D) is encoded and transmitted across a channel with capacity C, the distortion of the 
reconstruction at the decoder must obey the inequality R{D) < C, or equivalently, D > R~^[C). 
This lower bound is achievable (e,g., by separate source coding and channel coding) in the limit of 
large block length. 

Ziv and Zakai j24j (see also Csiszar [6], [7], [H] for related work) have observed that in order to 
obtain a wider class of data processing inequalities, the (negative) logarithm function, that plays a 
role in the classical mutual information, can be replaced by an arbitrary convex function Q, provided 
that it obeys certain regularity conditions. This generalized mutual information, Iq{X;Y), was 
further generalized in [22] to be based on multivariate convex functions, as opposed to the univariate 
convex functions in |24j . In analogy to the classical converse to the joint source-channel coding 
theorem, one can then define a generalized rate-distortion function Rq[D) (as the minimum of 
the generalized mutual information between the source and the reproduction, s.t. some distortion 
constraint) and a generalized channel capacity Cq (as the maximum generalized mutual information 
between the channel input and output) and establish another lower bound on the distortion via 
the inequality Rq{D) < Cq that stems from the data processing inequality of Iq. While this lower 
bound obviously cannot be tighter than its classical counterpart in the limit of long blocks (which 
is asymptotically achievable), Ziv and Zakai have demonstrated that for short block codes (e.g., 
codes of block length 1), sharper lower bounds can certainly be obtained (see also [H] for more 
recent developments). 

Gurantz, in his M.Sc. work [10] (supervised by Ziv and Zakai), continued the work in [23] at a 
specific direction: He constructed a special class of generalized information functionals defined by 
iteratively alternating between applications of convex functions and multiplications by likelihood 
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ratioqj (or more generally, Radon-Nykodim derivatives). After proving that this functional obeys 
a data processing inequality, Gurantz demonstrated how it can be used to improve on the Arimoto 
bound for coding above capacity [2] and on the Gallager upper bound of random coding [9] by a 
pre-factor of 1/2. 

Motivated by the belief that the interesting nested structure of Gurantz' information functional 
can be further exploited, we continue, in this work, to investigate this information measure and we 
further study its properties and potential. 

We begin by putting the Gurantz' functional in the broader perspective of the other information 
measures due to Ziv and Zakai [22], [23] (Section 2). Specifically, we first discuss two possible 
methods to define a generalized mutual information from the Gurantz' functional, each one with its 
advantages and disadvantages. We then show that both of these generalized mutual informations 
can be viewed as special cases of the generalized mutual information of [22], which is based on 
multivariate convex functions. The proof of this fact then naturally suggests a way to broaden the 
scope and define a family of information measures with a tree structure of convex functions and 
likelihood ratios. 

We then focus on a concrete choice of the convex functions (Section 3) in the Gurantz' informa- 
tion measure (in particular, power functions), which turn out to yield an information measure that 
extends the notion of the Bhattacharyya distance (or the Chernoff divergence) : While the ordinary 
Bhattacharyya distance is based on the (weighted) geometric mean of two replicas of the channel's 
conditional distribution (see, e.g., [171 eq. (2.3.15)]), the more general information measure consid- 
ered here, allows an arbitrary number of such replicas. This generalized Bhattacharyya distance is 
also intimately related to the Gallager function Eq{p, Q) [9], [17| . which is indeed another informa- 
tion measure obeying a data processing inequality [13\ Proposition 2], since it is yet another special 
case of the information measures in |22j . 

Finally, we apply the data processing inequality, induced by the above described generalized 
Bhattacharyya distance, to a detailed study of lower bounds on parameter estimation under additive 
white Gaussian noise (AWGN) and show that in certain cases, tighter bounds can be obtained by 
using more than two replicas (Section 4). In this particular case, it turns out that three is the 
'^The exact form of this will be given in the sequel. 
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optimum number of replicas in the high SNR regime. While the resulting lower bound may still not 
compete favorably with the best available bounds for the ordinary AWGN channel, the advantage 
of the new lower bound, relative to the other bounds, becomes apparent in the presence of channel 
uncertainty, like in the case of an AWGN channel with unknown fading. This different behavior, 
in the presence of channel uncertainty, is explained by the convexity property of the information 
measure. 

2 Preliminaries and Basic Observations 

In [To], a generalized information functional was defined in the following manner: Let X and Y be 
random variables taking on values in alphabets X and 3^, respectively, where here and throughout 
the sequel, all alphabets may either be finite, countably infinite, or uncountably infinite, like inter- 
vals or the entire real line. Let xi, X2, . . . , a^fc be a given list of symbols (possibly with repetitions) 
from A". Let Qi,Q2, ■ ■ ■ ,Qk be a collection of univariate functions, defined on the positive reals, 
with the following properties, holding for all i: 

1. hmt^otQi{l/t) =0. 

2. \Qi{0)\ <oo. 

3. Either the function Qi = Qi o Q2 o . . . o is monotonically non-decreasing and Qj+i is 
convex, or Qi is monotonically non-increasing and Qj+i is concave (here, the notation o 
means function composition). 

Now, define the Gurantz' functional as 



G{Y\x,xi,...,Xk) = / dy • Py|x(y|x) X 

Jy 



^ ,'PY\x{y\xi) fPY\x{y\x2) „ / „ / PY\x{y\xk) 
VI -T, — / I N • Q2 -p, — rn — T ■ Qs [ ■ ■ - Qk 



PY\xiy\x) \PY\xiy\xi) \" \PY\x{y\xk-i) J " \ 

where here and throughout, it is understood that integrals and probability density functions should 
be replaced, in the countable alphabet case, by summations and probability mass functions, re- 
spectively. 
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The data processing inequality associated with the Gurantz' functional is the following: Let 
X — )■ y — )• Z be a Markov chain and let Q\ be a convex function which, together with Q2, ■ ■ ■ , Qk, 
complies with rules 1~3 above. Then, 

G{Y\x,xi,...,Xk) > G{Z\x,xi,...,Xk). (1) 

The direct proof of this inequality is fairly straightforward [10]: First, observe that 

G{Y\x,xi, ...,Xk) = G{Y, Z\x,xi, ...,Xk) (2) 

due to the Markov property. Then, one can easily obtain a sequence of lower bounds on the right- 
hand-side (r.h.s.) of eq. ([2]) by successive applications of Jensen's inequality, where at each stage, 
the expectation with respect to (w.r.t.) PY\Xi,z propagates into the next convex function and then 
partially cancels out with the factor -Py,z|Xi(y) z\xi) at the denominator of the likelihood ratio. 

Note that according to the definition of G(Y\x, xi, . . . , x^), x is the random variable that controls 
the distribution of Y (as the averaging is w.r.t. Py\x{'\^))i whereas xi, . . . ,2;^ can be viewed as 
'dummy' variables. One way to define a generalized mutual information based on G, which is 
a functional of {PxYix,y)}, is by assigning a certain probability distribution to {x,xi, . . . ,Xk). 
Let P{x, xi, . . . , Xfc) = Px{x)P{xi, . . . , Xk\x), where Pxi') is the actual distribution of the random 
variable X and P{xi, . . . , Xk\x) is an arbitrary conditional distribution of {Xi, . . . , Xj.) given X = x, 
for example, P{xi, . . . ,Xk\x) = Y\i=iPx{xi) or P{xi, . . . ,Xk\x) = Y\i=i6{xi - fi{x)) for some 
deterministic functions {fi}- Now, for a given choice of {P{xi, . . . ,Xk\x)}, the Gurantz' mutual 
information Ig{X\ Y) can be defined as 

Ig{X-Y) = EG{Y\X,Xi,...,Xu) (3) 

where the expectation is w.r.t. the above defined joint distribution of the random variables X, Xi,..., 
Xk- This generalized mutual information is now a well-defined functional of Pxy = Px x Py\x- In 
principle, one may apply the generalized data processing inequality Ig{X;Y) > Iq[X\Z) for any 
given choice of {P(xi, . . . ,Xk\x)} (consider these as parameters) and then optimize the resulting 
distortion bound w.r.t. the choice of these parameters. 

Our first observation is that Ig{X]Y) is a special case of the Zakai-Ziv generalized mutual 
information [22], defined as 

IzziX. Y) - EQ [p^^^x^yy • • • , p^^^x^Y)) ' 
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where Q is a multivariate convex function of k variables and fii{-,-), i = 1,2, ... ,k, are arbitrary 
measures on x 3^. 

To see why this is true, consider the following: For each convex (resp., concave) function Qi{t), 
define the bivariate perspective function Qi{s, t) = s- Qi{t/s), where s > 0, which is a convex (resp., 
concave) function as well, and jointly in both variables O Subsection 3.2.6]. Thus, 

G{Y\xi, ...,Xk) 

= [ dyPY\x{y\x)Qi (^^^^^f^Q2 (. . .)) 
Jy V PY\x{y\x) J 

= dyPY\xiy\x)- — n^'3i ^ — f I wp — •••) 

Jy ' PY\x[y\x') V PY\x[y\x)/PY\x[y\x') J 

[a p ( \ '\n ( Py\xiy\^) PY\x{y\xi) ^ \ 

Jy \PY\x{y\x') PY\x{y\x') J 

H P ( I '^n n f^mM^ PY\x{y\x2) ^ \\ 

dy ■ PY\x{y\x)Qi ^ — ^ — rr^'^ — rr^'^s ••• 
' \PY\x[y\x') \PYix{y\x') PY\x[y\x') J J 

, p f I ( Py\xiy\'^) n ( n ( PY\x{y\xk~i) PY\x{y\xk) \ \\ 

dy • PY\x{y\x )Qi Q2[...Qk[ o.,,,, , p.,.„^,|./^ • • • (5) 



y 



y yPYixiylxr^^y'^^y PY\x{y\x') ' PY\x{y\x' 

Now, under the assumed properties of the functions {Qi}, it is easy to see that 

Q{to,ti, . . . ,tk) = Qi{to, Q2iti, Q^{t2, ■ ■ ■ Qk{tk-i,tk) ■ ■ ■))) (6) 

is jointly convex in {tQ,ti, . . . ,tk)- Thus, upon taking the expectation of the last line of dS]) w.r.t. 
Px{x'), we have (after multiplying the numerator and the denominator of each likelihood ratio by 
Px{x')) that EG{Y\X,xi, . . . ,Xk) is an instance of Izz{X;Y) for every given {xi, . . . ,Xk), with 
the assignments fii{x,y) = Px{x)PY\x{y\xi), i = 1,2, . . . ,k. 

We can represent the general structure of information functionals, such as Iq and Izz, as well 
as the forms in the different lines of eq. ([5]), graphically, in terms oi factor trees (i.e., factor graphs 
which are trees) that obey the following rules. 

1. There are two types of nodes, variable nodes and function nodes, and each edge of the tree 
connects a variable node and a function node. 

2. The root of the tree is a function node whereas the leaves are variable nodes. 
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3. Each function node is represented by a convex function Qi and each variable node is repre- 
sented by a Ukehhood ratio p{y\xk) / p{y\xi) , whose shorthand notation here wih be Lk^i- 

4. There is a directed edge from function node Qi to variable node ^ (denoted Qi — )■ if 
the information measure includes a product of the form Qi{-) ■ Lj ^. 

5. There is a directed edge from variable node Ljj to function node Qk (denoted Ljj — )• Qk) if 
Lij multiplies an argument of Qk- 

6. For every path Li,j — ^ Qk — ^ -^i^m: j must be equal to / (namely, xj — xi^. 

7. For all direct offsprings of the root, {Li j}, the second subscript j is the same. 

Now observe that Iq and Izz correspond to two extreme cases: While Izz corresponds to a factor 
tree where all k leaves are connected directly to the root, Ig corresponds to a simple chain (i.e., 
every node has one offspring and there is only one leaf), which alternates between variable nodes 
and function nodes. The form that appears in the last line of ^ corresponds to a binary tree with 
a comb structure, i.e., every node that is not a leaf has two offsprings, one of which is a leaf. More 
generally, every factor graph with a tree structure, that complies with the above rules, corresponds 
to a valid information measure that satisfies a data processing inequality. For example, the factor 
graph of Fig. [1] corresponds to the information measure 




o — 

Lf,c Q3 

Figure 1: The factor graph that represents the generalized mutual information of eq. ([7]). 

In view of the observation that EG{Y\X,xi, . . . ,Xk) a special case of the Izz{X;Y), there is 
another way to use it to obtain data processing inequalities for communication systems. According 



7 



to \22\ Theorems 3.1 and 5.1], the following is true: Let U X ^ Y he a Markov chain and let 
V = g(Y) where 5 is a deterministic function. Let iJ,i{x, y), i = 1,2, ... ,k, be arbitrary measures and 
define iii{u,y) = Pu{u)J2xPx\u{x\u)fii{x,y)/Px{x), Hi{u,v) = J2y: g{y)=v f^ii'^^v)^ « = 1, 
Then, 

Izz{X;Y)>Izz{U;V). (8) 

As described informally in the Introduction, the maximum of the left-hand side (l.h.s.) over Px and 
the minimum of the r.h.s. over Py^u (subject to some distortion constraint) can be thought of as 
generalized channel capacity and generalized rate-distortion function, respectively, as in |22j . Now, 
consider the special case where Izz is based on a multivariate convex function Q as defined in ([6]), 
where each bivariate convex function Qi is the perspective of a certain univariate convex function, 
i.e., Qi{s,t) = s ■ Qi(t/s). Then by a similar argument as above (going the other direction), we get 
another information measure in the spirit of Gurantz: 



Ig{X;Y)= / dxdyPxY{x,y)Qi[— rQ2 { — r • • • Qfc r~ 

Jxxy \PxY{x,y) \ni{x,y) \fik-i{x. 



y)_ 

y) 



■ (9) 



Since it is a special case of Izz{X; Y), then it obviously satisfies a strongj data processing inequality 
Ig{X; Y) > Ig(U; V). Assuming, in addition, that the encoder is given by a deterministic function 
X = f{u), we can choose fii{x,y) = Px{x)PY\x{y\xi), where Xi = f{ui) is a specific member in X 
and then fj,{y\ui) = -Py|x(y|/(^j))- We then obtain 

PY\xiy\f{ni))^ ( ^ ( PY\x{y\f{uk)) 

k 



(' A A P ( \n ( PY\x{y\f{ui)) ( 

/ dxdy ■ PxY{x,y)Qi[ — ^ Q2 ■ ■ ■ Q 

Jxxy V PY\x{.y\x) V 



PY\x{y\f{uk^i)) 



> I d„d.. . PuvMQ, ( ^g^Q. L^qJ /^If"!"'' I „ , I I . (10) 

Jxxy \Pv\u{v\u) V \Pv\u{v\uk-i) J 

Multiplying both sides by WiPuWi) and integrating over {uj}, we get 



where the expectation on the l.h.s. is w.r.t. Pxy{x, y) Hi Px{xi), and the expectation on the r.h.s. 
is w.r.t. Puv{u, v) Y\ - Pu{ui). This is different from the data processing theorem in [10], because it 
allows 'moving' in both directions of the Markov chain and not only to the right. 



^By "strong data processing inequality" we use the terminology of [22], meaning that for a Markov chain U 
X &ndV = g{Y), we have Ig{X- Y) > Ig{U; Y) > 7g(I7; V). 
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To summarize, we have seen two approaches to derive data processing inequahties from the 
inequahty G{Y\u,ui, . . . ,Uk) > G{V\u,ui, . . . ,Uk) for a Markov chain U ^ Y ^ V (where have 
shghtly changed the notation relative to eq. ([T])): According to the first approach, one alfows an 
arbitrary distribution Pui,...Uk\u averages both sides w.r.t. Pu x -P[/^^,,,(/j.|(/. This defines the 
Ig{U ; Y) and Ig{U ; V) as functionals of Puy and Puv, respectively, where Pui,...Uk\u serve as free 
parameters that can be optimized, to get the tightest distortion bound. The advantage of this 
approach is the free choice of Pui,...Uk\u^ which gives many degrees of freedom. The disadvantage 
is that Ig{U;Y) depends on the source and the encoder and there is no apparent way to prove 
a strong data processing theorem, in general, i.e., to prove that Iq{U]Y) can be further upper 
bounded by Ig{X]Y) (whatever its definition may be) and thereby define a channel capacity, 
that is independent of the source (in addition to a generalized rate distortion function, which is 
VD.m.lG{U]V) s.t. some distortion constraint). The inequality Ig{U;Y) > Ig{U;V) is relevant to 
situations where there is no encoder to be optimized, namely, when the channel from U to 1" is 
given and cannot be shaped by encoding. This happens, for example, in parameter estimation 
problems. 

According to the second approach, one limits Pui,...Uk\ui'^ij ■ ■ ■ ^'^kW) to be Y[i=i Pui^i)- This 
leaves no degrees of freedom, but it admits a strong data processing theorem, and hence allows 
to define both a generalized rate-distortion function and a generalized channel capacity, whose 
calculations are completely decoupled of each other. It is also much simpler to use. This type of 
data processing inequality is more suitable for coded communication systems, where there is also 
an encoder to optimize. 

Prom this point onward, we essentially confine ourselves to the second option, mainly for reasons 
of simplicity. 

3 Choice of the Convex Functions 

An interesting and convenient choice of the functions {Qi} is the following: Qi{t) = —t""^, and 
Qi{t) = t""^ for i > 2, where < Cj < 1, i = 1, . . . , /c. In this case, Qi{t) = — ^^^=1 "-i [g monotonically 
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decreasing and Qi+i is concave, so this choice comphes with the rules. In this case, we have: 

G{Y\xo,xi,. . . ,Xk) = - dyPY\x{y\xo) X 

Jy 



PY\xiy\xi) f PY\x{y\x2) f ( PY\x{y\xk 



where {bi} are given by: 



PY\xiy\xo) \PY\xiy\xi) V " \PY\xiy\xk-i 
I <^y\{Py\x{y\^^) (12) 

bo = 1-ai 
bi = (1 - 02)01 
62 = (1 - 03)0102 



1=1 

k 

bk = JJcii (13) 

1=1 

Note that the coefficients b^, . . . ,b}^ are ah non-negative and their sum is equal to 1. Conversely, 
for every set of coefficients {6j} with these properties, one can find oi, . . . , o^, all in [0, 1], using the 
following inverse transformation: 

01 = 1 - 60 

bi 



02 = 1 



l-ba 



bk- 



This allows us parametrize the information measure directly in terms of an arbitrary set of non- 
negative numbers {bi} summing to unity, without worrying about {oj}. The resulting informa- 
tion measure can then be viewed as an extension of the Chernoff divergence between two condi- 
tional densities, -Py|x(y|^o) find PY\x{y\xi), to a general number of densities, where the powers 
of {PY\x{y\xi)} always sum up to unity. Specializing this to the case bi = l/{k + 1) for all 
z = 0, 1, . . . , A;, eq. (fT2]) extends the Bhattacharyya distance. Following the discussion of the second 
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option at the end of Section 2, if, in addition, we assign -Pxi,...,Xfc|Xo(^i) • • • > Xk\xo) = Y[i=i Px{x 
then Ig{X] Y) = EG{Y, X, Xi, . . . , Xk) = -e'^o^'^'^^), where Eq is the Gahager function [9] 

r f ii+p^ 



Eo{p,Px] 




(15) 



Thus, lGiX;Y) extends, not only the Chernoff divergence, but also the Gallager function, albeit 
only at integer values of the parameter p. Indeed, it was shown in [131 Proposition 2] that the 
Gallager function (for every real p > 0) satisfies a data processing inequality, because it is also a 
special case of IzziX', Y). In other words, the generalized Chernoff divergence can be obtained as 
a special case of IzziX;Y) in two different ways: one is via Iq and the other is via the Gallager 
function. The advantage of working with Gallager's function for integer values of p, is that an inte- 
gral raised to an integer power (k + 1) can be expressed in terms of {k + 1) -dimensional integration 
over the {k + 1) replicas, Xo,xi,...,Xk, that in turn can be commuted with the additional out-most 
integration over 3^. In some situations, this enables explicit calculations more conveniently. 

4 Application to Estimation Theory 

In this section, we apply the data processing inequality associated with the generalized Bhat- 
tacharyya distance to obtain a Bayesian lower bound on the estimation error of parameter estima- 
tors of a parameter u modulated in a signal x{t,u) that is in turn corrupted by Gaussian white 
noise. As mentioned earlier, we essentially adopt the second approach discussed at the end of 
Section 2: Although we use the data processing inequality Ig{U;V) < Ig{U;Y), in some of our 
derivations, we eventually further upper bound IciU; Y) by a universal bound, that is independent 
of the modulation scheme x{t,-), so in a way, it conveys the notion of generalized capacity. The 
model we focus on is the following. 

The source symbol U, which is uniformly distributed 'm.U = [—1/2, +1/2], plays the role of a 
random parameter to be estimated. For reasons of convenience, we define the distortion measure 
between a realization u of the source and an estimate v (both in U) as 

d{u, v) = [{u — v) mod 1]^. (16) 



where 



imodl^ A + (17) 



11 



(r) being the fractional part of r, that is, (r) = r — [rj . Note that in the high-resolution limit 
(corresponding to the high signal-to-noise (SNR) limit), the modulo 1 operation has a negligible 
effect, and hence d{u, v) becomes essentially equivalent to the ordinary quadratic distortion. Indeed, 
most of our results in the sequel, refer to the high SNR regime. At any rate, under the modulo 1 
quadratic distortion measure, it is convenient to visualize U as being evenly distributed across the 
circumference of a circle of radius l/(27r) (or as a phase parameter) and then d{u, v) is the squared 
length of the shorter arc (or the smaller angel) between the two corresponding points on the circle. 

The channel is assumed to be an AWGN channel, namely, the channel output is given by 

y{t) = x{t,u) +n{t), 0<t<T, (18) 

where x{t,u) is an arbitrary waveform of unlimited bandwidth, parametrized by u and n{t) is 
AWGN with two-sided spectral density Nq/2. The energy 

rT 

E= x^{t,u)dt (19) 
Jo 

is assumed to be independent of u (for reasons of simplicity). The estimator v is assumed to be a 
functional of the channel output waveform {y{t), < t < T}. 

Before deriving lower bounds on the estimation error, Ed{U, V), we first need to derive the gen- 
eralized rate-distortion function and the generalized channel capacity pertaining to the generalized 
Bhattacharyya distance. This will be done in the next two subsections. 

4.1 Derivation of R{D) 

The "rate-distortion function" R{D) w.r.t. the information measure under discussion is given by 
the minimum of 

/■+1/2 

I{U-V) = - dv 

subject to the constraints Ed{U,V) < D and f^^j^ dvPY\ij{v\u) = 1. As explained in [23], it is 
enough to consider channels of the form Py|^(f |u) = f(v — u). Defining w = {v — u) mod 1, the 
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problem is then equivalent to 



max 



"1/2 



1/2 
+1/2 



d^t;./l/('=+l)(y;) 



.t. / dw ■ w^f{w) < D 

J -1/2 

/ dw ■ f{w) = 1. 

■7-1/2 



(20) 



This problem is easily solved using calculus of variations [I]. Suppose that /* is the optimum 
density and let / = /* + 5g, where g satisfies 

-1/2 



1/2 



Defining the Lagrangian 



-1/2 



dw ■ g{w) = 0. 



-1/2 



1/2 



dw/^/e^+^^H + A / dww'^f{w) + u dwf{w), (22) 



1/2 



-1/2 



1/2 



the condition for /* being an extremum is dJ{f + Sig) / d5\s=Q = for all g. Now, 



dJ{f + 5g) 



85 



-1/2 



dw ■ g{w) 



5=0 J -1/2 

For this integral to vanish for every g, one must have 

1 



(A; + l)/fc/(fc+l)(^„) 



+ Xw^ + ly 



0. 



(A; + l)/fc/('=+i)(u;) 
This means that /* is of the form 



+ Xw + V = const. 



C{s) 



(1 + sw;2)i+iA ^ 



where 



C{s) 



.+1/2 d»; 
,1/2 (l + s«;2)i+iA 

and the parameter s is determined such that 



Define also 



C{s) 



F{s) 



+1/^ w^dw 
_i/2 (1 + st/;2)i+i/fc 



D. 



+1/^ ui'dw 

_l/2 (l + st/;2)l+l/fc' 



(21) 



(23) 



(24) 



(25) 



(26) 



(27) 



(28) 
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Let us denote then 



C{s)F{s). Then, 
-RiDs) = 



-1/2 



-1/2 

C{s 
Cis)[G{s)f^' 



du;[r(w;)]^/(^+i) 
dw 



fc+i 



fl/2 

1/2 (l + st/;2)i/fc 



fc+1 



where we have defined 



Gis) 



dw 

1/2 (l + st/;2)i/fc- 



(29) 
(30) 



To summarize, we have obtained a parametric representation of R{D) via the variable s: 

Ds = C{s)F{s) (31) 
R{D,) = -C{s)[G{s)]''+\ (32) 

For later use, we point out that the functions C{s), F{s), and G{s) are intimately related. First, 
observe that 

"^/^ (1 + sw'^)dw 
.1/2 (l + su;2)i+i/fc 



G(.) 



Gis, 



+ sF{s). 



(33) 



Also, using integration by parts, 

G{s) 



w{l + sw 



2\-l/k 



+1/2 



-1/2 



2S ^ 

+ -.F(.) 



1 + 



+ ^ ■ F(»). 



Thus, 



2s 



+ ^.F(s) 



(34) 



(35) 



which gives a direct relationship between G{s) and -F(s) whenever k ^ 2. For k = 2, the terms 
pertaining to -F(s) cancel out, but we then have an explicit formula for C(s). 

While in general, R{D) is given only a parametric form and not directly, in the limits of very 
low and very high distortion, one can approximate R{D) directly as an explicit function of D. In 
particular, it is shown in Appendix A that in the low resolution regime. 



DiR) ^ -Vl + 

^ ' 12 15 ' 



(36) 
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where it should be kept in mind that for this information measure, R takes on values in the interval 
[—1,0]. Here and throughout the sequel, the notation B means that A/B tends to unity as a 
certain parameter (in this case, R) tends to a certain limit (in this case, —1), which will always be 
clear from the context. Here, the term 1/12 is the variance of U, which is uniform over [—1/2, +1/2], 
as no useful information is available except the prior. 

In the high-resolution regime {R — t- 0), the behavior depends on whether k = 1, k = 2, oi 
k > 2. In Appendix B, derivations are provided for all three cases. For k = 1, the rate-distortion 
function is approximated as 

R{D) ^ -AciVd. (37) 
or equivalently, the distortion-rate function is 

DiR) - (38) 



16cf 



where 



+00 



dt 



k 1/2^^ 



For k > 2, we have 

i^P)--4(^J -D or DiR)^--^l-^^j ■ R, (40) 

The case k = 2 lacks an explicit closed-form direct relation between R and D, but it shows that 

logD^log[-R{D)], (41) 

which means that the relation between R and D is essentially linear, like in the case k > 2, but 
in a slightly weaker sense. It is also easy to extend all the derivations to higher-order moments 
modulo 1 (see Appendix C for the high resolution analysis). 

4.2 Derivation of Ig{U; Y) 

As mentioned earlier, the channel is assumed to be an AWGN channel with unlimited bandwidth. 
The probability law of the channel from U to y is given by 



1 



T 



Py^^{y\u)^exp\-—J^ [y{t)-x{t,u)Ydt\ , (42) 
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where y in the l.h.s. designates the entire channel output waveform {y{t), < t < T}, and oc means 
that the constant of proportionahty does not depend on u. Let us denote 

rT 



p{u,u') 



1 

E 



x{t, u)x{t, u')dt. 



(43) 



Consider the integral 

k 



<iyll^PY\u{yW^)] 



i/(fc+i) 



i=0 



E 



UliiPYiuiyWr)]'/^'^'^ 



PY\u{yWo)'/^'+''^ 



U = uo 



E |exp 
iJexp I 

exp 



k 



{k + l)No Jo 
2 



{k + l)No Jo 
1 - 



[y{t)-x{t,uo)fdt 



'x{t,uo) +n{t)] 



^^:^|:/'b(t)-x(t,..)]^dt 



U = uo 



E 

iVo 



x(i, Ui) — kx(t, uq) 

i=l 

^ k k "1 

i=0 j=0 J J 



(it 



(44) 



where the last passage is associated with the calculation of the moment -generating function of the 
Gaussian random variable 



Z 



Jo 



x{t, Ui) — kx{t, uq) 



.1=1 
^0 r^rv^fc 



dt 



(45) 



which has zero mean and variance ^ Jq [^i=ix{t,Ui) — kx{t,UQ)]'^dt. 

The next step, in principle, is take another expectation over the last line of (j44p w.r.t. the 
randomness of {Ui}. This can be done explicitly for some specific classes of signals (e.g., when U 
is a phase parameter of a sinusoid), but in general, it is not a trivial task. As in p] and [22], we 
then resort to a lower bound (hence an upper bound on IciU ; Y)) based on Jensen's inequality, by 
raising the expectation operator to the exponent. Denoting 

-1/2 



x{t) = E{x{t,U)} = / du-x{t,u), 

J-l/2 

it is easily observed that since {Ui} are independent, then for all i ^ j- 

Ep{U,,Uj) = i • E y\{t,Ui)x{t,Uj)dt'^ = Lj\^{t)f^t ^ Q. 



(46) 



(47) 
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Note that the parameter g is always between and 1 and it depends only on the parametric family 
of signalslfl Specifically, continuing from the last line of (j44p . we have 




k k 



> exp 




Y.Ep{U.,uA 



Note that the expression E{1—q), that appears in the exponent, is equal to Var{x(t, U)}dt, which 
is a measure of the variability, or the sensitivity of the x{t,u) to the parameter u (in analogy the 
Cramer-Rao bound that depends on the energy of the derivative of the signal w.r.t. u, as another 
measure of sensitivity). Accordingly, classes of signals with smaller values of g (or equivalently, 
higher values of the integrated variance of x{t, U)) are expected to yield higher value of Ig{U ; Y), 
and hence smaller estimation error, at least as far as our bounds predict, and since g cannot be 
negative, the best classes of signals, in this sense, are those for which g = 0. Note also that for 
Jensen's inequality to be reasonably tight, the random variables {p{Ui,Uj)} should be all close to 
their expectation g with very high probability, and if this expectation vanishes, as suggested, then 
{p{Ui,Uj)} should all be nearly zero with very high probability. We will get back to classes of 
signals with this desirable rapidly vanishing correlation property later on. 

4.3 Estimation Error Bounds for the AWGN Channel 

We now equate R{D) to Ig{U;Y) in order to obtain estimation error bounds in the high SNR 
regime, where the high-resolution expressions of R{D) are relevant. As discussed above, in this 
regime, we will neglect the effect of the modulo 1 operation in the definition of the distortion 
measure, and will refer to it hereafter as the ordinary quadratic distortion measure. The choice 
k = 1 yields IciU^Y) < -e-(i-^)^/(27Vo) (ggg ^i^^ ^^^^^ following eq. ([SHD, this yields 

E{U -Vf>D (-e-{i-^')^/(2^o)^ = ^ (49) 



For example, if x{t,u) — XQ{t — it) is a rectangular pulse of duration A then g — A/T. 
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and so, the exponential decay of the lower bound is according to e~(^~^)^/^o. For k = 2, according 
to eq. (j4ip . we have logD ~ 2(1 — q)E/{3Nq), which means an exponential decay according to 
^-2{i-g)E/{3No) ^ ^j^-^j^ -g ]-,g^^gj.^ YoY k > 3, we use (I40p and the resulting bound decays according 
to exp{— (1 — p)kE/[[k + l)A'^o]}, which is better than the result of A; = 1, but not as good as 
the one of A; = 2. Thus, the best choice of k for the high SNR regime is A; = 2, namely, a 
generalized Bhattacharyya distance with A; + 1 = 3 replicas, rather the two replicas of the ordinary 
Bhattacharyya distance. 

Note that since > 0, as mentioned earlier, then for any family of signals, the exponential 
function g^-'^E / {ZNo) ^ universal lower bound (at high SNR) in the sense that it applies, not only 
to every estimator of U, but also to every parametric family of signals {x(t,u)}, i.e., to every 
modulation scheme without being dependent on this modulation scheme (see also |22j). This is 
in contrast to most of the estimation error bounds in the literature. In other words, it sets a 
fundamental limit on the entire communication system and not only on the receiver end for a 
given transmitter. Indeed, for some classes of signals, an MSE with exponential decay in E/Nq is 
attainable at least in the high SNR regime, although there might be gaps in the actual exponential 
rates compared to the above mentioned bound. For example, in [15], it is discussed that in the case 
of time delay estimation {x{t,u) = xo{t — u)), it is possible to achieve an MSE of the exponential 
order of e"^/^^^"-* by allowing the pulse So(i) to have bandwidth that grows exponentially with 
tQ Thus, by improving the lower bound ex.p{—E/NQ) (a special case of the above with A; = 1) 
to exp[— 2£'/(3iVo)], we are halving the gap between the exponential rates of the upper bound and 
the lower bound, from 2£;/(3iVo) to £;/(3iVo)- 

Our asymptotic lower bound should be compared to other lower bounds available in the liter- 
ature. One natural candidate would be the Weiss- Weinstein bound (WWB) [18], [E], [20], which 
for the model under discussion at high SNR, reads [181 P- 66]: 

WWB - sup h^-M-[l-rmE/mo)} .^q. 
- T/o 2(1 - exp{-[l - ri2h)]E/{2Nom ' 



where r{h) = p{u, u + h) = x{t, u)x{t, u + h)dt/E is assumed to depend only on h and not on 



u. 



While this is an excellent bound for a given modulation scheme {x{t,u), u G ZY}, it does not seem 



* Other examples include chirp-like signals, e.g., x{t,u) — sin(Me^') (for some given R > 0), as well as chaotic 
signals parametrized by their initial condition - see [11) . [12) and references therein. 
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to lend itself easily to the derivation of universal lower bounds, as discussed above. To this end, in 
principle, the WWB should be minimized over all feasible correlation functions r(-), which is not 
a trivial task. A reasonable compromise is to first minimize the WWB over r(-) for a given h, and 
then to maximize the resulting expression over h (i.e., max-min instead of min-max). Since the 
expression of the bound is a monotonically increasing function of both r(/i) and r{2h), and since 
both r{h) and r{2h) cannot be smaller than —1, we end up with 

WWB = (51) 

2(1 - e-^/^o) ^ ' 

as a modulation-independent bound. This is a faster exponential decay rate (and hence weaker 

asymptotically) than that of our proposed bound for k = 2. 

It is possible, however, to obtain a universal lower bound stronger than both bounds by a simple 
channel-coding argument, which is in the spirit of the Ziv-Zakai bound [23]. This bound is given 
by (see Appendix D for the derivation): 



,2 ~ 1 ^ I I E M 



oo 

2 , 



Q(^) A 1 / g-tV2dt (53) 



where 

Q(x) = 

and where M is a free parameter, an even integer not smaller than 4, which is subjected to opti- 
mization. Throughout the sequel, we refer to this bound as the channel- coding hound. In the high 
SNR regime, the exponential order of the channel-coding bound (for fixed M) is 

{ E M ^ 



2iVo M - 2 

which for large enough M becomes arbitrarily close to and hence better than the data- 

processing bound of e~2-^/(^^o). Note that the Z\v-7^akm bound [23| would be weaker in this context 
of universal lower bounds, since it is based on binary hypothesis testing (M/2 = 2), yielding an 
exponent of e~^^^°. 

In view of this comparison, it is natural to ask then what is benefit of our data processing lower 
bound. The answer is that the potential of the data-processing bound is much better exploited 
in situations of channel uncertainty, like in channels with fading. This is the subject of the next 
subsection. 
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4.4 Estimation Error Bounds for the AWGN Channel with Fading 



It turns out that the feature that makes the data-processing-theorem approach to error lower 
bounds more powerful, relatively to other approaches, is the convexity property of the generalized 
mutual information (in this case, Ig{U;Y)) w.r.t. the channel Py\u- Suppose that the channel 
actually depends on an additional random parameter A (independent of U), that is known to 
neither the transmitter nor the receiver, namely, 

/ + 00 
da- PA{a)PY\u,A{yW,a). (55) 
-oo 

where -Pa('3^) is the density of A. If we think of Ig{U;Y) as a functional of Py\U7 denoted 
X(Py|[/(-|ii)), then it is a convex functional, namely, 



(r+oo \ r+oo 

/ daPA{a)PY\u,A{-\u,a)\ < daPA{a)l{PY\u,A{'\^^(^))- 
J —oo / J —oo 



(56) 



This is a desirable property because the r.h.s. reflects a situation where A is known to both parties, 
whereas the l.h.s. pertains to the situation where A is unknown, so the lower bound associated with 
the case where A is unknown is always tighter than the expectation of the lower bound pertaining 
to a known A. The WWB, on the other hand, does not have this convexity property, as we shall 
see. 

Consider now the case where ^ is a fading parameter, drawn only once and kept fixed throughout 
the entire observation time T. More precisely, our model is the same as before except that now the 
signal is subjected to fading according to 

yit) = a- x{t,u) +n{t), 0<t<T, (57) 

where a and u are realizations of the random variables A and U, respectively. For the sake of 
convenience in the analysis, we assume that ^4 is a zero-mean Gaussian random variable with 
variance cr^ (other densities are, of course, possible too). 

We next compare the three corresponding bounds in this case. The overall channel from U to 
Y is 

PY\u{y\u)^j da--^==-exp|-— y [y{t)-a-x{t,u)fdtY (58) 
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Carrying out the integration, we readily obtain 



PY\uiy\u) oc exp <^ 61 



y(t)x{t, u)dt 



where 



Thus, 



2a' 



N^{l + 2a^E/No)' 



(59) 
(60) 



Ig{U;Y) 




y{t)x{t,Ui)dt 
"■^11 

U = uo 



Upon substituting y{t) = Ax{t, uq 

e 



y{t)x{t,uo)dt 



n{t), one obtains, after some straightforward algebra 



(61) 



Ig{U;Y) 



E exp <j ^ ( A^E^ P^(^o, U^) + 2AE ^ p{Uo, Ui)Zi + 
9k 



k + l 



i=l 

{A^E^ + 2AEZo + Zi] 



i=l 



1=1 



(62) 



where 

Zi = I n{t)x{t,Ui)dt, i = 0,1,2, ... ,k, (63) 
Jo 

and where the expectation is w.r.t. the randomness of A, {Ui} and {Zi}. Obviously, given 
A and {Ui}, the random variables {Zi} are jointly Gaussian with zero-mean with covariances 
^Ep{Ui,Uj). Motivated by the discussion at the end of Subsection 4.2, we now adopt the as- 
sumption of signals with rapidly vanishing correlation. In other words, we assume that p{u, u + h) 
vanishes so rapidl}0 as a function of h for every u, that it is safe to neglect piUi, Uj) altogether for 
all i 7^ j. This would make {Zi} independent and simplify the above expression to 



Ig{U;Y) = E 



exp 



ekE^A^ 
k + l 



exp 



Ok 
k + l 



{Zl + 2AEZo) 



Eexp 



ezl 

k + l 



(64) 



Upon calculating the expectation (w.r.t. both A and {Zi}), we obtain 



Ig{U;Y) 



{k + l){l + 2(j'^E/No) 
k + l + 2ka'^E/No 



k/2 



{k + l){l + 2a^E/No) 



{k + l){l + 2a^E/No) + 2ka^E/No y/l + 2pa^' 



(65) 



^Consider an asymptotic regime under which, the signal x{t, u) depends on an additional (design) parameter A, so 
that for every h ^ 0, p{h) — >■ as A tends to a certain limit, and that this limit is taken before the limit E/Nq — >■ oo. 
For example, \ix{t,u) = xo{t — u) is a rectangular pulse of amplitude \JE/ 1\ and duration A, then p(h) = [1 — |/ij/A] + 
which obviously vanishes as A — >■ for every h 0. 
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where 

^ 2{2k + l)a^E/NQ + k + l ^ ' 

Considering the high-SNR regime {E/Nq ^ 1), this is approximated as 

Applying the high-resolution approximation of D{R) for A; > 3, we get: 

E{U - F)2 > ^ . ./^, (68) 
a \ E 

where 

A simple numerical study indicates that {gk} is monotonically increasing and so the best bound is 
obtained for A; — t- oo (infinitely many replicas), where the constant is: 

5oo = lim = J- . . = = 0.03944. (70) 

Thus, our asymptotic lower bound for high SNR is 

li^M ,f^.E(U-Vf>'-^. (71) 

E/No-foo \ No a 

The WWB [181 P- 51], in its more general form, is given by 

WWB = sup —nr-r, 7^ — (72) 

where 

e^(^'^) = £^ ^ ' I,,, , SG[0, 1] (73) 

which for the fading channel under the high SNR regime of rapidly vanishing correlation signals, 
can be shown (using similar calculations as above) to be given by 



{/ 1+20-2 g/ATp A, ^ n 

V (l+2sa-'E/No){l+2[l-s]a^E/No) " 7^ ^ 
1 h = 



The problem is that, unless s = 1/2, either 2s > 1 or 2 — 2s > 1, and so correspondingly, for large 
enough values of E/Nq, either e^^"^^'^^ or e'^^^"^'''"^^ at the denominator diverge, and the WWB 
becomes useless. Thus, the only feasible choice of s is s = 1/2, in which case, the WWB becomes 

u2 2,i{l/2,h) 

WWB = sup 7T7T^ ■ (75) 
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But e^(V2,h) exactly our information measure for k = 1, and so, 




to the data processing bound. 

The channel-coding bound is based on a universal lower bound on the probability of error, 
which holds for every signal set. The problem is that under fading, we are not aware of such a 
universal lower bound. The only remaining alternative then is to use a lower bound corresponding 
to the case where A is known to the receiver, and then to take the expectation w.r.t. A, although 
one might argue that this comparison is not quite fair. Nonetheless, the derivation of this appears 
in Appendix E and the result is 



Thus, the data processing bound is better by a factor of 22.4 (13.5dB). 

Yet another comparison, perhaps more fair, can be made with a related bound, which based on 
binary hypothesis testing, but has the advantage of avoiding the use of the Chebychev inequality, 
that was used in the channel-coding bound. This is the Chazan~Zakai-Ziv bound (CZZB), an 
improved version of the Ziv-Zakai bound [23]. According to the CZZB, applied to our problem (see 
Appendix F for the derivation). 



which is again significantly smaller than our bound. Thus, we observe that while the WWB and 
the CZZB are excellent bounds for ordinary channels without fading, when it comes to channels 
with fading, the proposed data-processing bound has an advantage. 

5 Conclusion 

In this work, we have explored a certain class of information measures [10], which although being 
a special case of the Zakai-Ziv information measures [22], it has an interesting structure that calls 
for attention. We first put this class of information measures in the broader perspective, relating 




(77) 




(78) 
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it to other information measures, like those of [22], and then, by a specific choice of the convex 
functions, we defined a generahzed notion of the Chernoff divergence that is based on an arbitrary 
number of rephcas of the channel. Relations have be drawn between the generalized Chernoff 
divergence and the Gallager function, the ordinary Chernoff divergence, and even more specifically, 
the Bhattacharyya distance. We have also suggested a somewhat more general structured class 
based on factor trees. We then applied the data processing inequality, based on the generalized 
Chernoff divergence, and demonstrated that sometimes bounds can be improved by using more 
than k + 1 = 2 replicas. In particular, for the AWGN three replicas is the optimum number 
in the AWGN model, thus improving on [22], where only two replicas were used (the ordinary 
Bhattacharyya distance). While this bound still falls short compared to other bounds available 
from estimation theory, the data processing bound seems to be more powerful than others when it 
comes to channels with uncertainty, like fading channels. In this case, the limit of /c — t- oo gives the 
best result. 
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Appendix A 

how Resolution Analysis 

Low resolution analysis corresponds to very small values of s, which can be handled by a first 
order Taylor series expansion of the functions F{s), C{s) and G{s). Specifically, 




C{s) « 1 + 



k + l 

12k 
k + l 

80k 
k + l 

12k 



■ s. 



■ s 



■ s 



(A.2) 



(A.3) 



(A.l) 



Thus, 



= C{s)F{s) ^ - 



k + l 



(A.4) 



• s 



180k 
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and 



or 



and so 



R{D,) = C{s)[G{s)r+' « 1 - { ■ (A.5) 



Uk , , 

s ^ -^—rVR+l- (A.6) 



k + 1 



Appendix B 

High Resolution Analysis 

High resolution corresponds to s ^ 1. In this case, we have 

1 r+vs dw 



cis) J. 1/2 {i + sw^y+y^ 

1 /■+°° dt 



A Ck_ 



(B.l) 



Now, according to the relations between the functions C, F and G, derived in Subsection 4.1, we 
have: 

and also 

Comparing the two expressions of G{s), we get 
which leads to the equation 
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At this stage, we have to handle separately the cases k = 1, k = 2 and k > 2. 
Let us consider the case k = 1 first. In this case, the last equation reads 

- sF{s) ^ ^ « ^ 

s Vs ^/s 

and so, 

ci 



F s) 



g3/2- 

Thus, from the distortion equation, 

= C{s)F{s) - ^ • ^ = -, 

or equivalently, s = l/Dg. Now, 

G(s = 7^ + sF{s) — + s . — = — . 



From the rate equation, we have 



R{Ds) = C{s)[G{s)f 



Cl s 

4ci 



which means 

R{D) « -4ci\/^. 
or equivalently, the distortion-rate function is 

D{R) 



where it should be kept in mind that R takes on values in the range [—1,0] in this case. 
The case A; = 2 is handled as follows: 

dw _ 1 + 1 + \/^/2 _ 1 



G(s) = / , = ^ In ^ / = — ln l + - + Ws - + l 

and so, for s large G(s) ~ {In s)/^/s. By comparing the two expressions for G{s), we 



C(s) = Y^l + s/4 « V^/2. Consequently, 



_ G(g) - l/C{s) ^ his 
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Thus, Ds = F{s)C{s) (lns)/(2s) and -R{Ds) = C{s)G^{s) « (In^ s)/(2s). In the high-resolution 
hmit, the logarithmic terms are relatively negligible and so, we can deduce that 



lim 



logics 



lim 



logL> 



1. 



s^oo log[-R{Ds)] D^o log[-R{D)] 
Finally, we examine the case k > 2. Returning to eq. ()B.5p . now we have: 



and so 



and 



G{s) « ^ + 



(A:-2)si+Vfc' 



^ (A;-2)sVfc (A;-2)sVfc- 



The distortion equation then gives 



D, = Cis)F{s) 



Cfc {k - 2)si+Vfc 



^41/fc 



{k - 2)cfcsi/2+i/fc 



and the rate equation yields 



RiDs) = C(s)[G(s)] 



fc+i 



Ck 



{k - 2)sVfc 
41+1/fc / ^ 



fc+i 



CfcsV2+i/fe V A; - 2 

k 



k+1 



k 



k - 2^ 

Thus, the rate-distortion function and the distortion-rate function are approximated as 



R{D) 



/c If 2^^ 



k-2) ^(^)--4l^-i) 



(B.15) 



(B.16) 



(B.17) 



(B.18) 



(B.19) 



(B.20) 



(B.21) 
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Appendix C 



Higher Order Moments 

The high-resolution analysis can easily be extended to handle general moments of the estimation 
error, E\U — V\p, p > {p should not necessarily be integer). This gives for large s, 

c 7-1/2 [1 + 

and 

Here, we have to handle separately the cases k < p and k > p (and the case k = p will not be 
covered here, but since p is allowed to be non-integer, it can be approached by either p ^ k or 
p'\' k). In the case k < p, we have 

and so 

- R{D^) = C{s)[G{s)r^ « c'^ ^-^] . (C.7) 
D{R)^Si{k,p)[-Rr/''. (C.8) 

Note that in terms of the asymptotic behavior for small values of —R, the best choice of k is the 
largest integer strictly less than p. For p integer, this means k = p — 1. As for the case k > p, we 



Thus, 



Now, 



and so 



Thus, 



where 
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get: 



or 



So 



Here, 



Then, 



and we get 



where 



Appendix D 



Ds = C{s)F{s) 



{k-p)s^+y^' 



G{s) = + „ ''^'[\,, « ck2P/Hk-p)s'^''. 



R{Ds) = C{s)[G{s)f+^ ^2"^ 



k — p 



Ds, 



D{R)^-S2{k,p)R, 



S2{k,p) = 2-f(l-^ 



Derivation of the Channel Coding Bound 

For a given positive integer M, consider the following chain of inequalities: 



E{u -vy 

^ (2s)'p'{|^-''I^2f} 

-1/2 



1 r 1 1 

M^-Lf. d.-Pr{|f/-y|>-f/ = n| 



4M2 
1 

4M2 



1 ^Iz} r+'^/mi) ( . 



2M 



U = h 

2M 2 



4M y_i/(2M) M 1' ' - 2M 



rr 2i + 1 1 



(C.IO) 
(C.ll) 
(C.12) 
(C.13) 
(C.14) 
(C.15) 
(C.16) 



(D.l) 



.+1/(2A/) 
-1/(2M) 

Now, note that the integrand of the last expression has a simple interpretation: Consider the 
codebook of signals {x{t, Ui)}, < t < T, i = 0,1, . . . , M - 1 where Ui = {2i + 1)/(2M) - 1/2 + 
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and consider the (suboptimum) decoder that first estimates U by an arbitrary estimator V and 
then decodes the message according to the Ui that is nearest to V . The integrand in the last hne 
above is simply the probability of error of that decoder. This probability of error is lower bounded 
[17} p. 174, eqs. (3.73) and (3.75)] according to 

1 r,.. 1 



M 

i=0 



y Pr||[/-y|>47^=^--+^l 
^ \ ' - 2M 2M 2 J 



2 ^ I V ^0 M/2 - 1 



. io ./A.J^^ , (D.2) 

2^\yNQ M-2j' ^ ' 

where now M/2 should be an integer at least as large as 2, namely, M = 4, 6, 8, . . ., Thus, 

Appendix E 

Channel Coding Bound for the AWGN Fading Channel 

For a given value of the fading parameter A = a, the earlier derivation of the channel-coding 
bound implies 

m-vf>-^-Q\\l^- ^ I . (E.i) 
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Averaging over A and using Craig's formula (see, e.g., [I6]), we have 



8M^J_^ \\ No M-2 

da — ^^=^- / dt' • exp 



87rM2 ^/2^ Jo i 2(M - 2)No sin 

dc' / da — 1^=^- ■ exp 



,2 , 



SttM^ Jo y_oo \/W I 2(M - 2)A^o sin2 

1 r de 



> 



87rM2 7o ^1 + cr2£;M/[(M - 2)No sin^ 6 

1 r d0sin6' 
8^ Jo y/sin^ 9 + Ea^M/[NoiM - 2)J 

1 r d6lsin6l 



87rM2 70 ^l + cr2£;M/[iVo(A/-2)] 
1 



SttPP y/l + a'^EM/[NoiM -2)] 
For E/No large, this is approximately, 



1 M-2 



Snay/E/No V 

which is maximized (for even > 2) by Af = 4 to yield 




E _.2~ 1 0.001758 



Appendix F 



(E.2) 



liminf W—- £;([/- y)^ > ^ = - . (E.3) 

E/TVo^oo V iVo 1287r\/2CT cr 



Derivation of the Chazan Zakai Ziv Bound 

The CZZB g] asserts that 

/•I /■1/2-h 

E{U-V)^> dh-h du- Peiu,u + h), (F.l) 

where Pe{u,u + /i) is the probability of error associated with optimum hypothesis testing between 
the hypotheses y(t) = Ax{t,u) + n(t) and y{t) = Ax{t,u + /i) + n{t), assuming equal priors. 
Let us denote the probabilities of error of the two kinds hy Pe{u ^ u + h) and Pe{u + h — )■ h). 
Then, according to the Shannon-Gallager-Berlekamp theorem [17^ p. 159, Theorem 3.5.1], for every 
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s G [0, 1], at least one of the two following inequalities must hold: 



Pe{u ^ u + h) > - exp[/x(s, h) - sfi'{s, h) - sy/2fi"{s, h)] = A{s) (F.2) 
Pe{u + h-^u) > ^exp[/i(s,/i) + (1 - s)fi'{s,h) - (1 - s)y/2fi"{s,h)] = B{s) (F.3) 

where fi'{s,h) and fi"{s,h) denote the first two partial derivatives of fi{s,h) w.r.t s, and where 
for rapidly-vanishing-correlation signals, fi{s,h) is given by the (first line of) eq. (|74p . Since 



/i(l/2, h) = ln[fi/{ay^E/No), f-i'{l/2, h) = and fi"{l/2) f« 1/4 at the high SNR limit, this implies 
that 

Pe{u U + h) + Pe{u + h ^ u) 



Pe{u, U + h) 



2 

> sup - mm{A{s) , B (s)} 
o<s<i 2 

> imin{A(l/2), 5(1/2)} 

= ^ exp{/i(l/2, h) - 0.5 • ^^'(1/2, h)} 

8e^ a^/W/Ni^ 
0.042977 



(F.4) 



and so, 



,,,2 0.042977 , , ,,,, 0.00716 , 
E(U-Vf> ; / h(l-h)dh = ; , (F.5) 
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